Towards Learning Rules from Natural Texts

نویسندگان

  • Janardhan Rao Doppa
  • Mohammad NasrEsfahani
  • Mohammad S. Sorower
  • Thomas G. Dietterich
  • Xiaoli Fern
  • Prasad Tadepalli
چکیده

In this paper, we consider the problem of inductively learning rules from specific facts extracted from texts. This problem is challenging due to two reasons. First, natural texts are radically incomplete since there are always too many facts to mention. Second, natural texts are systematically biased towards novelty and surprise, which presents an unrepresentative sample to the learner. Our solutions to these two problems are based on building a generative observation model of what is mentioned and what is extracted given what is true. We first present a Multiple-predicate Bootstrapping approach that consists of iteratively learning if-then rules based on an implicit observation model and then imputing new facts implied by the learned rules. Second, we present an iterative ensemble colearning approach, where multiple decisiontrees are learned from bootstrap samples of the incomplete training data, and facts are imputed based on weighted majority.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Rules from Incomplete Examples: A Pragmatic Approach

In this paper, we consider the problem of inductively learning rules from specific facts extracted from texts. This problem is challenging due to two reasons. First, natural texts are radically incomplete since there are always too many facts to mention. Second, natural texts are systematically biased towards novelty and surprise, which presents an unrepresentative sample to the learner. Our so...

متن کامل

Structuring Natural Language Data by Learning Rewriting Rules

The discovery of relationships between concepts is a crucial point in ontology learning (OL). In most cases, OL is achieved from a collection of domain-specific texts, describing the concepts of the domain and their relationships. A natural way to represent the description associated to a particular text is to use a structured term (or tree). We present a method for learning transformation rule...

متن کامل

Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

We consider the problem of learning rules from natural language text sources. These sources, such as news articles and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We...

متن کامل

Mention Model for Learning Rules from Incomplete Examples

Introduction. We are motivated by the problem of learning rules from naturally available data sources such as natural language texts, web pages, and medical databases. At first, learning rules from natural sources like the web seems to consist of extracting specific facts followed by data mining of rules. Unfortunately, however, there are two major obstacles to fully realizing the dream of unli...

متن کامل

Learning Rules from Incomplete Examples via a Probabilistic Mention Model

We consider the problem of learning rules from natural language text sources. These sources, such as news articles, journal articles, and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the corr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010